On the Capacity of Multiplicative Finite-Field 

Matrix Channels 

Roberto W. Nobrega, Student Member, IEEE, Danilo Silva, Member, IEEE, and 
Bartolomeu F. Uchoa-Filho, Senior Member, IEEE 



Abstract — This paper deals with the multiplicative finite-field 
matrix channel, a discrete memoryless channel whose input and 
output are matrices (over a finite field) related by a multiplicative 
transfer matrix. The model considered here assumes that all 
transfer matrices with the same rank are equiprobable, so that 
the channel is completely characterized by the rank distribution 
of the transfer matrix. This model is seen to be more flexible than 
previously proposed ones in describing random linear network 
coding systems subject to link erasures, while still being suffi- 
ciently simple to allow tractability. The model is also conservative 
in the sense that its capacity provides a lower bound on the 
capacity of any channel with the same rank distribution. A main 
contribution is to express the channel capacity as the solution 
of a convex optimization problem which can be easily solved 
by numerical computation. For the special case of constant-rank 
input, a closed-form expression for the capacity is obtained. The 
behavior of the channel for asymptotically large field size or 
packet length is studied, and it is shown that constant-rank 
input suffices in this case. Finally, it is proved that the well- 
known approach of treating inputs and outputs as subspaces is 
information-lossless even in this more general model. 

Index Terms — Channel capacity, finite-field matrix channel, 
multiplicative matrix channel, noncoherent network coding, ran- 
dom linear network coding, subspace coding. 

I. Introduction 

Finite-field matrix channels are communication channels 
where both the input and the output are matrices over some 
finite field F„. The interest in such channels has been rising 
since the seminal work of Koetter and Kschischang El, which 
connects finite-field matrix channels to the problem of error 
control in noncoherent network coding. In contrast with the 
combinatorial framework of [3], the present paper follows Q- 
[6 1 and adopts a probabilistic approach. 

The object of study of this work is the multiplicative finite - 
field matrix channel (MMC), modeled by the lawQ 

Y = GX, (1) 
where X £ fnxi j s ^ cnanne i i n p U t matrix, Y E jr™x« 
is the channel output matrix, and G € F™ x ™ is the channel 

This work was supported in part by CNPq-Brazil. The material in this 
paper was presented in part at the 2011 IEEE International Symposium on 
Information Theory [1 1. Some of the earlier ideas on which this work is based 
appeared in an unpublished draft (2). 

The authors are with the Department of Electrical Engineering of the 
Federal University of Santa Catarina, Florianopolis 88040-970, Brazil, (email: 
rwnobrega@eel.ufsc.br; danilo@eel.ufsc.br; uchoa@eel.ufsc.br). 

Copyright (c) 2013 IEEE. Personal use of this material is permitted. 
However, permission to use this material for any other purposes must be 
obtained from the IEEE by sending a request to pubs-permissions@ieee.org. 

Throughout this paper, random entities are represented using boldface 
letters, while italic letters are used for their samples. 



transfer matrix, with X and G statistically independent. For 
simplicity, we assume max{n, m} < I. This model turns out 
to be well-suited for random linear network coding systems (7} 
in the absence of malicious nodes, but possibly subject to link 
erasures. In this context, X is the matrix whose rows are the 
n packets transmitted by the source node, Y is the matrix 
whose rows are the m packets received by the sink node, and 
i is the number of g-ary symbols in each packet. Also, G is 
the network transfer matrix, whose probability distribution 
is dictated by the network topology, the random choices of 
coding coefficients, and the link erasure probabilities. 

Multiplicative finite-field matrix channels have been pre- 
viously considered by Silva et al. pj and Jafari et al. |6). 
Specifically, in (5), G is chosen uniformly at random among 
all full-rank matrices, while in [6], G has i.i.d. entries selected 
uniformly at random (or, equivalently, G is uniform over all 
matrices). Although these transfer matrix distributions could 
in principle be used to model random linear network coding 
systems, they cannot properly reflect different network topolo- 
gies or accurately describe systems in which link erasures 
play an important role. This is because in these models the 
transfer matrix distribution is completely specified by the field 
size q and the dimensions n and m. On the other hand, 
a full description of a completely general transfer matrix 
distribution requires, in addition, the specification of q nm 
parameters (namely, Pr[G = G], for G € F™ xn ), therefore 
being impractical even for modest values of q, n and m. 

In view of this tension between tractability and generality, 
the present paper suggests a new model which generalizes 
both the models of [5] and 16} , but still keeps to a realis- 
tic level the amount of information needed to describe the 
channel. Specifically, we allow the probability distribution 
of the rank of G to be arbitrary; nevertheless we consider 
that all matrices with the same rank are equiprobable. We 
say such a transfer matrix is uniform given rank (abbreviated 
as u.g.r.). Under this assumption, the probability distribution 
of the rank of the transfer matrix completely determines the 
distribution of the transfer matrix itself and, therefore, also 
completely determines the channel. Thus, the model only 
requires min{rt, m} + 1 parameters to describe the channel 
(namely, Pr[rankG = r], for < r < min{n,m}). While it 
is a challenging problem to obtain the rank distribution ana- 
lytically for a general network topology (even in the simplest 
case of erasure-free links), in practice, a reasonable estimate 
may be obtained more simply by Monte Carlo simulation for a 
given network model. In fact, the (empirical) rank distribution 
is a natural figure of merit for most noncoherent network 
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coding implementations (see, e.g., (8)). Thus, it is not entirely 
unrealistic to assume that this information is indeed available. 

In order to convince the reader of the usefulness of the 
proposed model in practical scenarios, we provide an example 



(see Section IV i on how the u.g.r. transfer matrix is able to 
better capture some properties of noncoherent network coding 
systems when compared to existing models. Specifically, we 
will see that for certain network topologies, the capacities 
in (5J, 1 6J deviate more and more from the true capacity as the 
(graph) distance between the source and sink nodes increases 
or the link erasure probability grows. Furthermore, as we shall 
prove, any MMC can be reduced to our model (although with 
a potential decrease in the channel capacity) by means of 
a simple preprocessing at the transmitter and receiver. Since 
this preprocessing does not alter the rank distribution of the 
transfer matrix, this implies that among all transfer matrices 
sharing the same rank distribution, the u.g.r. is the one with 
lowest channel capacity. In this sense, the u.g.r. model seems to 
arise naturally in the study of multiplicative finite-field matrix 
channels. 

In this paper, we concentrate on the problem of finding 
the capacity and mutual information of the MMC with u.g.r. 
transfer matrix. We show that the capacity is achieved when 
the input matrix (similarly to the transfer matrix) is u.g.r, and 
an expression for the mutual information is derived for this 
kind of input. As a consequence, we are able to greatly reduce 
the complexity of the convex optimization problem involved 
in obtaining the channel capacity and the associated optimal 
input, when compared to the most general MMC model — a 
reduction from q ni to n + 1 variables, as we shall see. We 
then turn over to the special situation of constant-rank input. 
In this case, we are able to obtain a closed-form expression for 
the constant-rank capacity. Later on, we consider the problem 
in which q or £ are allowed to grow arbitrarily, and show 
that the true channel capacity is achieved by constant-rank 
input. As a final contribution, we verify that communication 
via subspaces is still optimal when the transfer matrix is u.g.r. 
This generalizes similar conclusions previously obtained in [5] 
and (6J. 

A related line of work by Yang et al. j9|-|12|, done 
concurrently to and independently of our work, considers 
a completely general transfer matrix distribution (with the 
transfer matrix still independent of the input). They were 
able to identify a class of inputs (which they call "a-type") 
that is sufficient to achieve the channel capacity. As a result, 
the number of optimization variables required to compute the 
channel capacity is reduced — although to a number that is still 
exponential in the matrix size. They also derive upper and 
lower bounds on the capacity which depends only on the rank 
distribution of the transfer matrix. It is worth mentioning that 
some of our results can be obtained by specializing the results 
in (5J to a u.g.r. transfer matrix. (Appropriate comparisons are 
made along the text whenever applicable.) Nevertheless, we 
believe that the approach we follow here is simpler and more 
insightful for this particular case. 

Finally, it is worth noticing that some of the results obtained 
in this paper have been subsequently employed in fl3) , where 
an arbitrarily varying channel approach to the MMC is 



considered. More precisely, |[T3j assumes that the rank of 
the transfer matrix is randomly chosen according to a known 
probability distribution, but, apart from that, the transfer matrix 
can be changed arbitrarily from time-slot to time-slot. It is 
shown that the capacity of this channel is the same as the 
capacity of the MMC with u.g.r. transfer matrix considered 
here. 

The remainder of this paper is organized as follows. Sec- 
tion [n] presents some notation, basic facts, and a brief review 
on discrete memoryless channels. Section [III] defines the 
channel model under consideration. Section II VI considers a 
motivating example. Section [V] contains the main results of 
this work, whose proofs are located in Section VI Section VII 
concludes the paper. 

II. Notation and Background 

Let W q be a finite field. We denote by F™ x ™ the set of all 
to x n matrices with entries in ¥ q , and by 7^(F™ X ") those 
matrices in F™ xn with rank r. For notational convenience, 
we sometimes set % = 7^(F™ X ") when the matrix dimen- 
sion m x n and the field size q are implied by the context. 
Also, T(F™ X ") = 7^ ln{n , m }(F^ xn ) is the set of all to x n 
full-rank matrices. It is well-known (see, e.g., [14]) that 

r-1 

|T(F™ xr )| = TJ(<? m -<f), 



i=0 



for r < to, and 



ir r (F™ x ")| = |r(F" ixr )| 



where 

r 



q - <r 



if < r < n, 
else, 



(2) 



(3) 



denotes the Gaussian binomial coefficient. It is also known 
that the Gaussian binomial coefficient satisfies [3, Lemma 4] 



r(n-r) < 



<j q q r{n - r \ 



(4) 



where 



oo 1 



In this paper, we let (A) denote the row space of a matrix A, 
and 1[P] the indicator function of P, that is, 



1[P] 



1, if P is true, 
0, otherwise. 



A discrete memoryless channel (DMC) fT3) with input x 
and output y is defined by a triplet (X,p y \ x ,y), where X 
and y are the channel input and output alphabets, respectively, 
and Pyi x > called the channel transition probability, gives the 
conditional probability that y = y E y is received given that 
x = x G X is sent. The channel is memoryless in the sense 
that what happens to the transmitted symbol at one time is 
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independent of what happens to the transmitted symbol at any 
other time. The capacity of the DMC is then given by 

C = maxiYx; y), 

where /(x; y) is the mutual information between x and y, and 
the maximization is over all possible input distributions p x . 

An interesting question is whether input or output letters of 
a DMC can be grouped together without reducing the channel 
mutual information. The following result (see, e.g., fl6] §5.9— 
5.10]) derives the conditions under which such groupings are 
information-lossless. 

Lemma 1: Let (X,p y \ x ,y) be a DMC with input x and 
output y. In addition, let / : X — > U and g : y — > V be 
surjective functions, and define u = /(x) and v = /(y). The 
following holds: 

1) 7(x; y) = J(u; y) for all p x if and only if, for every pan- 
ic, x' € X satisfying f(x) = f(x'), we have p y \ x (y\x) = 
Py\x{y\x') for all y ey. 

2) J(x; y) = /(x; v) for all p x if and only if, for every pair 
U,y' € y satisfying g(y) — g(y'), there exists some real 
number a such that p y \ x (y'\x) — ap y \ x (y\x) for all 

xex. 

III. Channel Model 

The MMC described by the channel law ([T]i can naturally 
be viewed as a DMC defined by 

= PY |x, y = F™* £ ), 

where the channel transition probability is given by 

Py\x(X\X) = ^PG|x(G|X)p Y | X , G (y|X, G) 



G 

E 

G 



p G {G)l[Y = GX] 



(and thus completely characterized by pc). This work deals 
with a special class of this channel, in which the transfer 
matrix G is "uniform given rank," a concept defined next. 

Definition: A random matrix A e F™ xn distributed accord- 
ing to pa is said to be uniform given rank (u.g.r, for short) if, 
for every A, A' € F™ xn , we have p^{A) = pj±(A') whenever 
rank A = rank A'. 

Let A be a random matrix over F™ x " with probability 
distribution pj±- Also, let k = rank A; this is a random 
variable taking values on {0, . . . , min{n, m}} according to 
a probability distribution pk given by 



Pk(k) 



E Pa(A). 

a en 



Then, it is clear that A is u.g.r. if and only if 



Pa(A) 



|r fc (F™ x ")|' 



where k = rank A. In this way, the rank probability distribu- 
tion completely determines pj± for A u.g.r. In addition, it 
is not hard to show that the entropy of A satisfies 

|T fc (F^ x ")| 

^ q Pk(*) ' 



H(A) <^ Pk (fc)log 9 



MMC with u.g.r. transfer matrix G' = TiGT 2 
Arbitrary MMC 



X' 
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Fig. 1: Turning an arbitrary MMC into an MMC with u.g.r. transfer matrix. 
The rank distribution of the new channel is the same as the original channel. 



with equality when A is u.g.r. This is because among all 
matrices with a given rank probability distribution, the u.g.r. 
is the one with largest entropy. 

As said before, both the models of Silva et al. and 
Jafari et al. (6| are special cases of the u.g.r. model consid- 
ered here. Indeed, let r = rank G, distributed according to 
Pr(f) = SgsT Pg(G\)> t> e me random variable representing 
the rank of the transfer matrix. Then, for the channel model 
in |5j, where G is uniformly distributed over T(F" X "), we 
have 

1 1, if r = n, 
P ^={0, else, (6) 

while for the channel model in |6j, where G is uniformly 
distributed over F™ x ", we have 



Pvir) 



ir r (F™ x ")| 



(7) 



(5) 



We remark that every MMC can be artificially transformed 
into an MMC with u.g.r. transfer matrix (having the same rank 
distribution as the original channel) by means of "randomiza- 
tion" at both the transmitter and receiver. Theorem [2] below 
makes this precise. We prove this theorem as an application 
of a generalized version of the crypto lemma p7) , which 
may be useful in other applications. The proofs are given in 
Appendix [A] 

Theorem 2: Let G e F™ x ™ be a random matrix with 
arbitrary probability distribution, and define G' = T1GT2, 
where T 1 e T(F™ xm ) and T 2 g T(F" xn ) are uniformly 
distributed full-rank square matrices, independent of G and 
of each other. Then, G' is u.g.r. and has the same rank 
distribution as G. 

Effectively (see Fig. [TJ, instead of transmitting the original 
source packets (say X'), the transmitter sends X = T 2 X'; 
and instead of the actual channel output (say Y), the receiver 
considers Y' = T]Y for decoding. (Here, Ti and T 2 
are defined as in Theorem [2]) Consequently, if the transfer 
matrix of the original channel is G, we have Y' = TiY = 
TjGX = TiGT 2 X' = G'X', where G', according to 
Theorem [2] is u.g.r. and has the same rank distribution as G. 
Naturally, from the data-processing inequality [15], we have 
7(X';Y') < J(X;Y), so that this transformation comes at 
the expense of a potential reduction of the channel capacity. 

Thus, we conclude that, among all transfer matrices sharing 
the same rank distribution, the u.g.r. is the one with lowest 
channel capacity, and that any capacity result obtained for the 
MMC with u.g.r. transfer matrix can be used as a lower bound 
for MMCs with non-u.g.r. transfer matrices. 
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Fig. 2: Wireless layered relay network. There are L layers, and each layer 
has N relay nodes. 



A few more comments are in order. First, note that random- 
ization at the transmitter (but not at the receiver) is already a 
usual practice in random linear network coding systems (3). 
Second, since both the multiplication of matrices and the 
generation of a random invertible matrix can be accomplished 
in polynomial time, the randomization is also a polynomial- 
time procedure. Third, because Ti and T 2 are independent 
of G and of each other, no channel knowledge is assumed, 
and no common randomness shared by the transmitter and 
receiver is required. Finally, for a numerical quantification of 
the rate loss incurred by randomization, refer to Example [2] in 
Section [IV] 

IV. Motivating Example 

In this section, we present an example showing how the 
u.g.r. model is able to better model a noncoherent network 
coding system. Consider the wireless relay network depicted 
in Fig. [2] with L layers (columns) and N relay nodes per layer. 
Assume that the system operates with packets of length £, and 
that between each two consecutive layers (also between the 
source node and layer 1, and layer L and the sink node) there 
are N orthogonal broadcast channels, which are subject to 
independent erasures occurring in the end of the channel with 
probability e. Whenever a packet is erased, it is considered 
to be received as the all-zero vector. In addition, assume that 
there is no communication between nonadjacent layers, as well 
as between nodes in the same layer. 

The system operates as follows. First, the source node trans- 
mits packets to the first layer by using all the N orthogonal 
broadcast channels. It repeats this process M times, so that 
a total of MN packets is received by each node in the 
first layer. (It is assumed that the source does not perform 
any randomization.) After that, each node in the first layer 
computes M random linear combinations (with i.i.d. uniform 
coefficients in ¥ q ) of all its received packets, and broadcasts 
these linear combinations to the second layer, again in M time 
slots, by using one of the N orthogonal channels assigned to 
it. In this way, a total of MN packets is received by each 
node in the second layer, M from each node of the first layer. 
The system operates similarly up to layer L. Finally, the sink 
node receives MN packets, M from each node in layer L. 

We now show that this system can be modeled as an 



MMC with n = m = MN. Let X e F 



MNxl 



(resp.. 



Y e Fq INxi ) denote the matrix whose rows are the packets 
transmitted (resp., received) by the source (resp., sink) node. 



Let Rij G ¥^ Nxe (resp., S id £ 



Fq Ixe ) denote the matrix 



whose rows are the packets received (resp., transmitted) by 
the j-th relay node of the i-th layer, for 1 < i < L and 
1 < 3 < N, From the network operation just described, we 
know that 



A TJ 

l >3 2 J ' 



TMxMN 



for 1 < i < L and 1 < j < N, where A. hJ G F^ 
matrices whose entries are i.i.d. selected uniformly at random. 
We also know that 



Si- 1,1 o_l,i 



R, 



E,; 



S-t-i 



N 



and Y = E' 



>L,JV 



where E itj ,E' G ¥ q 



MNxMN 



for 2 < i < L and 1 < j < N, 
are diagonal matrices (modeling the erasures) whose diagonal 
entries are i.i.d. with p(0) = e and p(l) = 1 — e. From this, 
we can deduce that Y = GX, where 



G = E'A L E L • • • A 2 E 2 AiEi, 



(8) 



in which Aj G F* liv x MN (a block-diagonal matrix) and E 4 G 

¥ MN 2 xMN ^ giyen by 





Ej,i 


j Ej — 









Note that, in general, the transfer matrix given in ([H} is 
not u.g.r. Therefore, as mentioned in Section III the capacity 



results from Section [V] will serve only as lower bounds on 
the channel capacity. We herein call the attention to the fact 
that the calculation of the real value of the channel capacity 
is a computationally heavy task, even for small values of 
parameters. For example, when q — 2 and n = m — £ = 8, 
a priori, we need to solve an optimization problem over 
q-ni _ variables, which is clearly impractical. According 
to J9|, we could simplify the problem to Y2=o ["] > 2 18 
variables, but this number is still impractical. 



Example 1: Figs. 3a and 3b show the rank distribution p r 
induced by the wireless layered relay network with q = 2 and 
N = M = 2 (thus, n — m = MN = 4), as a function of e, for 
L = 1, and as a function of L, for e = 0, respectively. Note that 
the value of t is unimportant here. Both rank distributions were 
obtained from §8§ by the Monte Carlo method with 100,000 
realizations. 

Figs. [3c] and [3d] show the channel capacity of the corre- 
sponding MMC assuming u.g.r. transfer matrix, with the rank 



distributions of Figs. 3a and 3b and considering a packet 
length I = 8. The results were obtained from Theorem [4] of 
Section [V] The figures also show the capacity obtained for a 
system with the same parameters q, n, to, and I, but modeled 
with a full-rank uniform transfer matrix (3) or with a uniform 
transfer matrix [6], as well as the coherent upper bound of [9] 
(i.e., the channel capacity assuming that both the transmitter 
and receiver know the transfer matrix). ■ 
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Clearly, the models of 15) and |6) are insensitive to the 
effects of link erasures and variations on the topology (here 
illustrated by the number of layers). The capacities for theses 
models are seen to deviate substantially from the true capacity. 
In contrast, from the trends of the lower and upper bounds 
curves, it can be inferred that the capacity for the u.g.r. model 
behaves much like the true capacity (note that the upper bound 
goes to zero as e approaches one or L increases; therefore, so 
does the true capacity). In fact, as the next example illustrates, 
the u.g.r. lower bound may actually be close to the true 
capacity. 

Example 2: This example aims to quantify the rate loss 
incurred by considering a matrix channel being u.g.r. when, 
in fact, it is not. For such, we consider the wireless layered 
relay network with field size q = 2, a single layer (L = 1), 
and two relay nodes (N = 2). We also set M = 1, so that 
n = m = 2. In this case, ([8]) yields 



G = E'AiEi 



e 5 aie! 
e6a 3 e 3 



e 5 a 2 e 2 
e 6 a 4 e 4 



where ei, . . . , eg G F 2 (related to the erasures) are i.i.d. with 
Pr[ei = 0] = e, and ai,...,a4 G F 2 (the network coding 
coefficients) are i.i.d. with Pr[a ; = 0] = 1/2. The transfer 
matrix distribution pg(G) with e = 1/4 is shown in Fig. 4a 
which also shows the corresponding u.g.r. distribution. 



Fig. 4b shows the true channel capacity (obtained by solving 
the original maximization problem over q ni = 64 variables), 
along with the u.g.r. lower bound (obtained by solving a 



maximization problem over n + 1 = 3 variables, according 
to Theorem and the coherent upper bound (given by E[r], 
according to Yang et al. [9|), as a function of e, for a packet 
length I = 3. It is interesting to observe that the u.g.r. lower 
bound is tight for e = 0, since in this case G becomes 
uniformly distributed over F™ x ™, and thus u.g.r. Also, for 
all other values of e, the true capacity is very close to the 
u.g.r. lower bound, which constitutes an evidence that the 
u.g.r. model serves as a good approximation for noncoherent 
network coding systems. ■ 



V. Main Results 



This section present the main results of this work, whose 



proofs are left to Section VI In what follows, we consider 



an MMC with input matrix X, output matrix Y, and u.g.r. 
transfer matrix G. In addition to r = rankG, distributed 
according to p r (r) — 2~2g<eT Pg(G), we also make use of 
the random variables u = rankX and v = rankY, whose 
probability distributions are given by p u (u) = 2~2xeT PX-{X) 
and p v (v) = 2~2yeT v Py(Y), respectively. 

The rank transition probability, that is, the probability of 
receiving a matrix with rank v = v given the transmitted 
matrix has rank u = u, plays an important role in this work. 
Since u — ► X — > Y — >• v forms a Markov chain, the rank 
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(a) True and u.g.r. transfer matrix distributions, for e = 1 /4. 
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(b) True capacity and bounds, as a function of e, for 



Fig. 4: Transfer matrix distribution and channel capacity for the wireless layered relay network with L = 1, N = 2, M = 1, and q = 2. In Fig. [4a] the 
horizontal axis consists of all the matrices in F^ xi , ordered from left to right as follows: [00; 00], [10; 00], [01; 00], [00; 10], [00; 01], [11; 00], [00; 11], 
[10; 10], [01; 01], [11; 11], [10; 01], [01; 10], [11; 10], [11; 01], [10; 11], [01; 11]. 



transition probability is given by 

Pvlu (v\u) = ^ P v \-y{v\Y)py\x_(Y\X)px_\ u (X\u) 



X,Y 

E 

xer„ 



J*|u(*M E PY|X(^I^), 

and, therefore, may depend not only on Py|x (i- e -> on Pg), 
but also on px|u- In the next theorem, we find the value of 
the rank transition probability for the case of a u.g.r. transfer 
matrix, and we show that it is independent of px|u- We also 
determine the channel transition probability in terms of the 
rank transition probability. 

Theorem 3: The following holds for the MMC with u.g.r. 
transfer matrix: 

1) Let u, v, and r be nonnegative integers such that r < 
min{n, m}. We have 



P v \u,r{v\u,r) 



LvJ q 



71 — U 

r — v 



v(n—u—r-{-v) 



(9) 



which does not depend on px\u- Thus, the rank transition 
probability is given by 

P v \u(v\u) =^2pr(r)p v \ u ,r(v\u,r), 

r 

and the output rank probability is given by 
Pv(v) = 22Pu(u)Pv\u(v\u). 



2) The channel transition probability is given by 

Pv\u(v\u ) 



<4> if(Y)C(X), 



(10) 



0, else. 
Moreover, if the input X is u.g.r., so is the output Y. 

Remark: Let u, v, and r be nonnegative integers such that 
r < minjn, m}. Recall from |3]l that the Gaussian binomial 
coefficient [ x ] is nonzero if and only if < y < x. Thus, 
according to (j^ji, we have p v \ u ^ r (v\u, r ) 7^ if and only if < 
v < u and < r — v < n~ u; these, in turn, are equivalent to 



u+r — n<v< min{u, r}. This is expected: the upper bound 
follows trivially because rankGX < minjrankX, rankG}, 
and the lower bound follows from Sylvester's rank inequality, 
which says that, if G and X are matrices of sizes m x n and 
n x I, respectively, then rank X + rankG — n < rankGX. 

We next derive the channel capacity. We will see that u.g.r. 
input suffices to achieve the capacity, so that there is no need 
to consider more general inputs. Let 



I*(Pu) = max/(X;Y) 

Px:p u 



(ID 



where the maximum is over the collection of all matrix 
probability distributions px with associated rank probability 
distribution equal to p u , that is, over the set 

{Kx : T,xeT u PxPQ = Pu(u), for u = 0, . . . , n}. 

Theorem 4: The capacity of the MMC with u.g.r. transfer 
matrix is given by 

C = max/*(p u ), 

where I*(p u ), as defined in ( fTl) , is achieved by u.g.r. input, 
and is given by 

I* (Pu ) = E («) log, — " 7—\ E h ^Pu ( U ) » ( 1 2 ) 



where 



h u = ^ Pvlu (v\u)\o gq — W) . (13) 



From Theorem [4] we can see that the problem of finding the 
capacity and the corresponding optimal input for the MMC 
with u.g.r. transfer matrix, which was originally a convex 
optimization problem over q nt variables (namely, px(X) for 
X E F" x£ ), is simplified to another convex optimization 
problem, this time involving only n + 1 variables (namely, 
p u (u), for u — 0, ...,n). The solution to this optimization 
problem can be obtained by standard methods (see, e.g., [18|). 

We now focus on the special situation in which the input 
matrices are restricted to have constant rank. This case is 
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of interest for at least two reasons. First, constant-rank input 
happens to be asymptotically optimal both in the packet length 
and in the field size (as we shall see next). And second, 
most of the existing practical constructions for subspace codes 
are "codes in the Grassmannian," that is, constant-dimension 
subspace codes p}. 

Let C u denote the maximum channel mutual information 
when the input is restricted to rank-u matrices. Let u* denote 
the value of u that maximizes C u , so that C u * = max n C u . 
We call C u the rank-u capacity, and C u * the constant-rank 
capacity of the multiplicative finite-field matrix channel. 

Theorem 5: The rank-u capacity of the MMC with u.g.r. 
transfer matrix is achieved by the uniform [over 7i(F™ x ^)] 
input distribution, and is given by 



Cu = ^p v |u(w|u)l0g 5 



["1 

Lvl q 

w 

lvl q 



Moreover, 



C u , <C <C U , + log (min{n, m} + 1). 



(14) 



(15) 



Remark: In particular, if the input is always full rank (i.e., 
u = n), then v = r (since v = rankY = rankGX = 
rank G = r). The capacity becomes simply 



C n=^2Pr(r)l0gqJ H r-: 
r lr\ q 



a result obtained earlier in [2|. Moreover, since p v \{x)(v\U) 
only depends on U through u = dim U (see Theorem aj, our 
result agrees with (91 Theorem 7]. 

We next turn to the behavior of the channel for asymp- 
totically large packet length £, and asymptotically large field 
size q. We show that, for both scenarios, constant-rank input 
suffices to achieve the capacity. 

Consider first the asymptotic behavior in the packet length I. 
In this situation, it is appropriate to define C = C/£, 
the normalized capacity of the matrix channel, measured in 
packets per channel use. We also define the normalized rank- 
u capacity as C u — C u /£, and the normalized constant-rank 
capacity as C u * , where u* is the value of u that maximizes C u . 

Theorem 6: Asymptotically in the packet length £, the 
normalized capacity of the MMC with u.g.r. transfer matrix is 
achieved with constant-rank uniform input, and is given by 

lim C = Mrl. 

The optimal input rank is always u* = n. 

Remark: This result is also obtained in [9 Corollary 1] for 
the case of an MMC with a general transfer matrix. 

We now turn to the asymptotic behavior in the field size q. 
In a general situation, the rank distribution may depend on q 
[for example, the case in ([?}]. Thus, in what follows, we let 

p~(r) = lim Pr (r) 

q— Voo 



denote the limiting distribution of r in q, assuming such a limit 
exists. Of course, when the rank distribution does not depend 
on q, then p£°(r) = p r (r). 

Theorem 7: Asymptotically in the field size q, the capacity 
of the MMC with u.g.r. transfer matrix is achieved with 
constant-rank uniform input, and is given by 



lim C = max 



(£ - u) p^ (r) min{u, r} 



Remark: Consider random linear network coding in the 
absence of link errors and erasures. When the field size q is 
asymptotically large, it is known [7| that the transfer matrix 
will have rank h with probability approaching one, where h 
is the network mincut. In this case, p^(r) = l[r = h], so that 

lim C — max U£ — u) min{u, h}] — (£ — u*)u* , 

q— J-oo u 

where u* = mm{h, [£/2\}. For the sub-case in which h = 
min{n,77i}, we have u* = min{rt,m, |^/2J}, which agrees 
with |5 Proposition 3] and (6] Theorem 2], since in both cases 
-Pr°( r ) = l[ r = niin{rt, m}] [see equations |6]) and d7}]. 

Our last result is concerned with the optimality of sub- 
space coding pi for the MMC with u.g.r. transfer matrix. 
Let V(Fq,d) denote the set of all subspaces of with 
dimension d or less. 

Theorem 8: Consider the MMC with u.g.r. transfer matrix. 
Define U = (X) and V = (Y). Then, 



7(X;Y)=7(U;V), 



(16) 



for every input distribution px- Furthermore, for every U € 
V{¥ l q ,n) and V G V(F q ,m), we have 

Pv\u(V\U) - \T(F^ di ™ v )\ PY|x(im (17) 

where X £ F™ x£ and Y € F™ x£ are an y matrices such that 
(X) = U and (Y) — V. 

As a consequence of Theorem [8] the matrix channel 

{x = w? 1 , p Y |x, y = F™* e ) 

can be transformed into a (simpler) subspace channel 

(U = V(F v ,n), pv|u, V = V(F q ,m)) 

with channel transition probability pv|u given by ( fl7] >. Con- 
cretely, the new channel is obtained by concatenating the 
original channel at the input with a device that takes a 
subspace U to any matrix X such that (X) = U, and at 
the output with a device that computes V = (Y). Due 
to ( fTo*] ), any coding scheme for the matrix channel has a 
counterpart in the subspace channel achieving exactly the same 
mutual information, and vice versa. In particular, one may 
focus solely on {U,py\\j : V) when designing and analyzing 
capacity-achieving schemes. 
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VI. Proofs 

This section presents the proofs omitted from Section [V] 
In order to preserve space, we will often drop the subscripts 
of the probability distributions, writing, for example, p(X) 
instead of px(X). Before we proceed, we present a series of 
matrix enumeration results that will prove useful throughout 
this section. 



Lemma 9: Let X G T u (¥ q xt ) be given. The number of 
matrices Y G %{W™ xe ) such that (Y) C (X) is given by 

\{Y G % : (y) C (X)}\ = \T v (¥" q 

T v (¥ q nxe ) be given. The nur 
IeT„(F; x< ) such that (Y) C (X) is given by 



Lemma 77: Let X G T«(F™ x£ ) and y G 7;(F™ xf ). The 
number of matrices G G 7^(F™ xn ) such that GX = Y is 

|{G G % : = y}| =0,(m,n,u,r,i;)l[<Y) C (X)]. 

Proo/v Let X G 7;(F£ xf ), y G 7;(F™ x£ ), and define 

j(i,y) = {Ger r :Gx = y}. 

If (y) <£ (X), then clearly \ J(X,Y)\ = 0, since no G can 
take X into Y. Suppose, then, that (Y) C (X). Using a 
similar argument as employed in the proof of Lemma [9] we 
can conclude that it suffices to show the result for 



Now, let y G T v (W™ xe ) be given. The number of matrices 



X 



lu 





G F^ 



\{Xe%:(Y)C(X)}\ = \T v <W? Xn )\ 



\T u (¥ n q 



|%(F™ x£ ) 
Proof: For every X e%(Fq Xt ), define 

J(X) = {YE%: (Y) C (X)}. 



Let X l5 X 2 G T U (F 



•nxl\ 



Then, there exist invertible matrices 
such that Xi = SX 2 T. It is not hard 



where /„ is the u x u identity matrix. For this particular X, 
we must have Y = [Y 0] for some Yq G 7^(F™ xtI ) (recall 
that (y) C (X) is assumed). On the other hand, we also have 
Y = GX = [G 0], where G eF™ x " is the left mxu 
sub-matrix of G. We thus have G G J{X, Y) if and only if 
G G 7;(F™ xn ) and G = Y e %(¥' q nxu ). The result now 
follows from Lemma [TOl ■ 



S G F£ x ™ and T G F* x 
to show that y i-> yT _1 is a bijection between ^7(Xl) and 
J(X 2 ), so that we must have \J{X X )\ = \J{X 2 )\. Therefore, 
to compute the value of we can set 



We are finally ready to prove the theorems. 



Proof of Theorem^ Let X G T u (F% xt ), Y G 
and r such that < r < min{n, m}. We have 



mxl\ 



X 



la 





e F' 



n.x( 



p(y|X,r) 



p(G\r)p(Y\X, G) 



1 



we 



where I u is the mxu identity matrix. Since Y G J{X) if and 
only if y is of the form [Y 0], where Y G T V (W? 
conclude that \J(X)\ = |T t ,(F™ xu )|, as desired. 
Now, for every Y G T„(F™ X ^), define 

JC(Y) = {XeT u : (Y) C (X)}. 



|r r (F 

(6) 1 

|r r (F? 



mxn 



E i[y = cx] 



GGT,. 

^(m,n,t*,r,w)l[<y) C (X)}, 



where (a) follows because G is u.g.r., and (b) follows from 
Lemma [TTJ Therefore, from Lemma [9] we may write 



Similarly to the previous paragraph, it is possible to show that 

|/C(Yi)| - \IC(Y 2 )\ for every Y U Y 2 G T„(F™ x£ ). Consider p(v|A» = £ p(y|X,r) = 

" x ) are the nodes in YeT v 



then a bipartite graph where Xs in 7^(F q 
the left-hand side, ys in T v (¥ q nxe ) are the nodes in the right- 
hand side, and in which a node X is connected with a node 
y if and only if (Y) C (X). The number of edges connected 
with nodes in the left-hand side, namely, \T u {¥ r q lxe )\ \J(X)\, 
must be equal to the number of edges connected with nodes in 
the right-hand side, namely, |7i(F™ x *)| \JC(Y)\, from which 
the second statement follows. ■ 

The next lemma is a combinatorial result by Brawley and 
Carlitz Q9). 

Lemma 10: Let G G T t ,(F™ xu ) be a given matrix. The 
number of matrices G G 7^(F™ xn ) whose left m x u sub- 
matrix is Gq is given by 



fmxu j 

mxn\ 
1 



|T r (F™ x ")| 



<f> q (m,n,u,r,v), 



so that, 

p(v\u, r) 



p(X\u)p(v\X,r) 



E 

xeT u 

\ - , yl jr,(F- x -)| 

> 71 (A U) 0o (?71, 71, U, T, f 



r^(F™ xu )i 

|r r (F; 
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imxnj 



<j> q (m,n,u,r,v), 



and ( [Tol l follows by comparing the expressions for p(Y\X,r) 
andp(v\u, r). To prove we substitute 4> q (m, n, u, r, v) with 
its definition (see Lemma [T0|, to get 



4> q (m,n,u,r,v) 



|T(F™ 



-r-\-v) 



p(v\u, r) 



We now derive another basic enumeration result which is 
closely related to the multiplicative finite-field matrix channel. 



|T t ,(F^ x ")| \T{¥ q nxr )\ 
|T r (F™ x ")| |T(F™ X ")| 

v{n—u—r-\-v) 



v(n—u—r-\-v) 



EL 



where we used |2} in the last step. 
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To finish the proof, assume that X is u.g.r. Then, for each 

Y e T v (F™ xe ), we have 



tion $12) 



as 



u X6T„ 
(a) ^ p(u) 



E 



E 



(6) K") 



xer„ 



E 



-|r„(F™ x ")l- 



|7L(Fg x Q| 

|r u (Fr £ )l ir„(F™ x «)|"^- 9 ;i |r„(F™ x£ )| 

p(v) 



where (a) follows because X is u.g.r., (b) follows from ( fTO) , 
and (c) follows from Lemma [9] Therefore, Y is also u.g.r., as 
claimed. ■ 

Proof of Theorem^ For each X E T„(F" x£ ), we have 
H(Y\X = = £ £ P(Y\X) log, — 1— 

= ^(40 log, =ft,, 

where we substituted p(Y|X) as in ( |T0| >. Averaging over all 

X G F" x£ , we get 



iJ(Y|X)=^ ^ ff(Y|X = A>(X) 

u XGT„ 

=E /i « E 

u xer u 

= E^ph, 

which depends on px only through p u . Therefore, 



/*(p u ) = ma X /(X;Y) 

Px:ju 



= max [if (Y) - £T(Y|X)] 



Px:Pu 



= [max if(Y)] - h u p(u), 

Px:p u » — ' 

u 

and we get the desired result from Q. ■ 

Proof of Theorem [5| If the input is restricted to rank- 
it matrices, then u = u is a constant, and therefore p(v) = 
p{v\u). The channel mutual information given by Theorem [4] 
simplifies to 



^2p{v\u)\og q 



|T,(F^ X Q| 
\T V {¥^ U ){ 



and we get ( fT4| > by applying |2|. 

The lower bound of (jT3J is immediate. Similarly to Yang et 
al. in (9] Lemma 4], we can rewrite the mutual informa- 



I*(Pu) = J2p{v) log, q 22p(u)h u 



53 p{u)p(v\u) log 



|7^(F-x^)| 
" P(«) 

r,(F^ x ")| 

\%(¥^)\ 

q \T v (¥ 7 q nxu )\ 
p{v\u) 



-^2p(u)p{v\u) log g 

U,1> 

= Ek^m^m log 

u,v 

+ y2p{u)p{v\u)\og 

= 53p(u)C , »+/(u;v), 

where 7(u; v) is the mutual information between the random 
variables u and v. The upper bound of ( fT5| then follows 
because ^2, u p{u)C u < max n C u = C u * and J(u;v) < 
log 9 (min{n, m} + 1). ■ 

Proof of Theorem pt Dividing ( fT5] l by and taking the 
limit when £ — > oo, we obtain 

lim C = lim C u * , 

so that constant-rank input is sufficient to achieve capacity for 
asymptotically large £. Now, dividing ( fT4] > by £, and taking 
the limit when £ — > oo, we obtain 

A™ ° u = E^W ( Mm J log, ji^j 

= 53p( w | u ) ( Um - log, 

« V 

=EK^)(^iiog, 

= 53 f = 25[v|u = u], 



A™ ~£ l0g « 



u 


.) 


V 









where the first equality in the last line is a consequence of Q. 
Finally, since v < r, we have 

_E[v|u = u, r = r] < r = E[v\u = n, r = r], 

for all u € {0, . . . , n}. Multiplying both sides by p(r) and 
summing over r, we obtain 

E[v\u = u]< E[r] = E[v\u = n], 

which shows that limf_ i . 00 C u = E[v\u = u] is maximum 
when u = n, with the maximum value being E[r]. ■ 

For the next result, we will need the following intuitive fact. 

Lemma 12: We have 



lim p(v\u, r) 

q— >oo 



1, if v — min{w, r}, 
0, else. 
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Proof: This is clearly true if v > min{ii, r}. When v < 
min{w,r}, we have from Q and from Theorem [3] that 

u(u— v) _ — 1 — r(ii — r) _ (r—v)(n—u—r-{-v) _ ~v(n— u— r+u) 



ii 




n 


-l 


ri — u, 




9 


r 




r — D 


r) 




r— u 


) (n— 



After simplifying, we get 

— 1 — (u— v)(r— v) ^ i i \ ^ 2 — (u— v)(r— v) 

l q q y n ' < p{v\u,r) < j q q ( A >, 

and the desired result follows because lim^oo j q = 1. ■ 

Proof of Theorem [7| The quantity log g (min{n, to} + 1) 
in the right-hand side of (JT3J goes to zero as q — > oo, so that 

lim C = lim C u * , 

that is, constant-rank input suffices for asymptotically large q. 
Now, from ( [14) , we have 

lim C u = V ( Urn f lim log 

For the first parenthesis, we have from Lemma [12] that 

n 

lim p(v|it) = N p^°(r)l[u = min{u,r}]. 



r=0 



For the second parenthesis, we have from Q that 



!™ lo S g rul 
Lvl q 



v(e-u). 



Therefore, 

lim C u 

q— foo 



(r)l[f = min{u, r}]w(^ — u) 

v r 

= (£ — u) (r) 1 [v = minjii, r}] v 

r v 

= {£- u )^2pi°( r ) min{u, r}, 

r 

as desired. ■ 

Proof of Theorem [Sf From Theorem [3] we know that 
Py\x(Y\X) depends on X and Y only through (X) and (Y). 
Therefore, according to Lemma [T[ the maps /(X) = (X) and 
g(Y) = (Y) are information-lossless. This proves ( [To} . 

To prove ( |17) , we first apply the input grouping to the 
original matrix channel (X ,pY\x,y), to get an intermediate 
channel (U,pv\u,y), with p Y |u(^|C^) =Py|x(^|-^"). where 
X is such that (X) = U. Then, we apply the output grouping 
to this intermediate channel to get the subspace channel 
(U,Pv\u,V) with 

Pv\v(V\U) = Pv\u(Y'\U) 

Y':(Y')=V 

= \T(¥^ dimV )\ PYlu (Y\U), 

where Y is such that (Y) — V. Note that the last step in the 
above equation follows from 

|{F' e F™ x£ : (Y') = V}\ = |T(F™ xdimy )|, 



which is true because associated with every Y' £ jrmx^ 
such that (Y 1 ) = V, there is a unique full-rank matrix T G 
T(F™ xdimy ) such that Y' 
is any fixed full-rank matrix satisfying (Y) = V 



TY, where Y € T(¥f mVxe ) 



VII. Conclusions 

This work has considered probabilistic multiplicative finite- 
field matrix channels in which the transfer matrix is uniformly 
distributed conditioned on its rank. We advocate the applica- 
tion of this channel model in practical noncoherent network 
coding systems subject to link erasures, for we believe it 
is flexible enough to capture the essential characteristics of 
the system, while still being mathematically tractable. This 
contrasts with previously considered channel models, which 
are either too restrictive or too complex. 

As contributions, we have shown that the problem of finding 
the channel capacity can be reduced to a convex optimization 
problem onn + 1 variables (rather than q nt ), allowing for easy 
numerical computation by standard techniques. We have also 
specialized our results to the important case of constant-rank 
input, in which we were able to find a closed-form expression 
for the capacity. For asymptotically large field or packet length, 
we have shown that constant-rank input is optimal. Finally, we 
have proven that even in our more general setup, subspace 
coding is still sufficient to achieve capacity. Many of our 
results generalize existing conclusions in prior literature. 

The present paper has focused mainly on the capacity and 
mutual information of the multiplicative finite-field matrix 
channel. The design of low-complexity capacity-achieving 
schemes for this channel is an important and still largely 
open problem. Recent work by Yang et al. |9), fT2) has 
addressed this problem by considering the construction of 
codes based on the expected value of the rank of the transfer 
matrix, E[r\. Nevertheless, the design of codes based on the 
rank distribution p r is yet to be investigated. Finally, another 
challenging and interesting research line motivated by the 
present work is the computation of the rank distribution as 
a function of a given network topology. 

Appendix A 
A Variation of the Crypto Lemma0 

We start by recalling the following well-known result, 
known as the crypto lemma for the case of finite groups fTT) . 

Lemma 13: Let (<?,-) be a finite group. Let y = g • x, 
where x and g are random variables over Q, and g is uniform 
over Q and independent of x. Then, y is uniform over Q and 
independent of x. 

Now, let S be a set. Recall that a (left) group action of Q 
on S is a binary operator o : QxS —> S such that {grg2)°x = 
9i ° (92 x), for all gx,g\ G Q and x £ S; and eo x = x, for 
all x £ S, where e is the identity element of Q. Every group 
Q acts on itself (S = Q) by left multiplication, that is, through 
the action given by gox = g-x. This appendix generalizes the 
crypto lemma from this special case to the case of an arbitrary 

2 This appendix is a joint work with Chen Feng. 
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action of Q on some finite set S. Before we proceed, we need 
to recall a few basic facts about group actions |20 §4.1]. 



For every ieS, the orbit of Q containing x is defined as 
Q o x = {g o x : g € G}- The relation on S defined by 



iff 



= g oy for some g £ Q 



is an equivalence relation. We have x ^ y \W Q ox — Q oy \li 
x and y are in the same orbit. The size of each orbit is given 
by \Q o x\ = 101/1^,3:1, where Q x>x = {g e Q : g o x = x} is 
the stabilizer of x in £7 (a subgroup of Q). An action is called 
transitive if there is only one orbit. 

Lemma 14: Let (0, ■) be a finite group, 5 a finite set, and 
o:5x5->(Sa group action of Q on S. Let y = g o x (so 
that x and y lie in the same orbit), where x and g are random 
variables over S and Q, respectively, and g is uniform over Q 
and independent of x. Then, y is piece-wise uniform over the 
orbits of the action and conditionally independent of x given 
that a particular orbit occurs. 

Remark: In particular, if the action is transitive, then y is 
uniform over S and independent of x. This is the case of the 
action g o x — g ■ x, so we recover Lemma [T3] 

Proof: Since g is uniform and independent of x, we have 
that, for all x,y e S, 



\Gx,y\ 
101 



where Q XAJ = {g E Q : g o x = y}. If x ~ y (so that 
Q o x = Q o y), it can be shown that Q x y is a coset of the 
stabilizer Q x>x , which implies \G xy \ = \G X , X \, and thus 

, . , \g x , x \ i i 

Py\x[y\ x ) = — n^r" = Tn T = 17^ r 

\y\ \G°x\ \yoy\ 

On the other hand, if x y, then clearly p y \ x (y\x) = 0. 
Therefore, 



Py(y) = ^2py\x(y\ x )p*( x ) 



1 



\Q°y 

Pr[x r 



j y] _ Pr[y ~ y] 



10° 2/1 \Q°y\ 

and the lemma follows. ■ 

Theorem [2] is a corollary of this result. 

Proof of Theorem |2j The result follows after applying 
Lemma [ft] with = r(F™ xm ) x T(F£ X "), where the 
operation li {T{,T^) ■ {T U T 2 ) - (T{Ti, T 2 T£), S = F™ xn , 
and o : x 5 ->• 5 defined by (Ti,T 2 ) o M = T X MT 2 . 
The facts that (0, •) is a group and o is an action of Q on S 
follow from basic linear algebra; the orbits, in this case, are 
{7^(F™ X ") : r = 0, . . . ,min{n,m}}, which are completely 
characterized by the rank of G. ■ 
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